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(54) Abstract Title 

Method of extracting a signal from a contaminated signal 

(57) A method is provided of extracting desired signals St from contaminated signals y t measured via 
respective communication channels. The system comprising the desired signals s, and the channels is 
modelled as a state space model. In the model, the desired signals have time-varying characteristics which 
vary more quickly than second time-varying characteristics of the channels. The method may be employed to 
extract individual speech signals from speakers s t (1) and s/ 2 * in a room 1, contaminated signals y t being 
sampled by microphones 6, 7, 8 and processed by a computer 9 which extracts the individual speech signals 
for suppry to respective loud speakers 13, 14. 
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At least one drawing originally filed was informal and the print reproduced here is taken from a later filed formal copy. 
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METHOD OF EXTRACTING A SIGNAL 

The present invention relates to a method of extracting a signal. Such a method may be 
used to extract one or more desired signals from one or more contaminated signals 
received via respective communications channels. Signals may be contaminated with 
noise, with delayed versions of themselves in the case of multi-path propagation, with 
other signals which may or may not also be desired signals, or with combinations of 
these. 

The communication path or paths may take any form, such as via cables, 
electromagnetic propagation and acoustic propagation. Also, the desired signals may in 
principle be of any form. One particular application of this method is to a system in 
which it is desired to extract a sound signal such as speech from contaminating signals 
such as noise or other sound signals, which are propagated acoustically. Another 
particular application of this method is to digital communication, for example where the 
signals represent digitised data and the or at least one channel is time-varying. 

According to a first aspect of the invention, there is provided a method of extracting at 
least one desired signal from a system comprising at least one measured contaminated 
signal and at least one communication channel via which the at least one contaminated 
signal is measured, comprising modelling the system as a state space model in which 
the at least one desired signal has first characteristics and the at least one 
communication channel has second characteristics which are different from the first 
characteristics. 

State space models are known in mathematics and have been applied to the solution of 
some practical problems. A state space model relates to a system in which there is an 
underlying state of the system which it is desired to estimate or extract. The state is 
assumed to be generated as a known function of the previous state value and a random 
error or disturbance term. The available measurements are also assumed to be a known 
function of the current state and another random error or noise term. 
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It has been surprisingly found that a state space model may be successfully applied to 
the problem of extracting one or more desired signals from a system comprising one or 
more measured contaminated signals and communication channels. It has been realised 
that the time varying characteristics of the or each desired signal differ from the time- 
varying characteristics of the or each communication channel in such a way that the or 
each desired signal can be extracted from the or each contaminated signal which is 
actually measured. This technique makes possible the extraction of one or more desired 
signals in a tractable way and in real time or on-line. A further advantage of this 
technique is that future samples are not needed in order to extract the samples of the 
desired signal or signals although, in some embodiments, there may be an advantage in 
using a limited selection of future samples. In this latter case, the samples of the desired 
signal or signals are delayed somewhat but are still available at at least the sampling rate 
of the measured signals and without requiring very large amounts of memory. 

The first characteristics may be fixed and known. For example, this techniques may be 
applied to echo cancellation. As an alternative, the first characteristics may be time- 
varying. Although the second characteristics may be unknown, they may be fixed and 
known. For example, the characteristics may have been previously determined by 
training data or in any other appropriate way. 

The second characteristics may be time-varying. The first time- varying characteristics 
may vary on average more quickly than the second time-varying characteristics. For 
many systems, the characteristics of the desired signal or signals vary relatively rapidly 
whereas the characteristics of the communication channel or channels vary more slowly. 
Although there may be abrupt changes in the channel characteristics, such changes are 
relatively infrequent whereas signals such as speech have characteristics which vary 
relatively rapidly. By modelling these characteristics in such a way that the different 
rates of variation are modelled, the extraction of one or more signals is facilitated. 

The at least one communication channel may comprise a plurality of communication 
channels and the at least one contaminated signal may comprise a plurality of 
contaminated signals. The at least one desired signal may comprise a plurality of 
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desired signals. The number of communication channels may be greater than or equal 
to the number of desired signals. Although it is not necessary, it is generally preferred 
for the number of measured signals to be greater than or equal to the number of desired 
signals to be extracted. This improves the effectiveness with which the method can 
recreate the desired signal and, in particular, the accuracy of reconstruction or extraction 
of the desired signal or signals. 

The at least one contaminated signal may comprise a linear combination of time- 
delayed versions of at least some of the desired signals. The method is thus capable of 
extracting a desired signal in the case of multi-path propagation, signals contaminating 
each other, and combinations of these effects. 

The at least one contaminated signal may comprise the at least one desired signal 
contaminated with noise. Thus, the method can extract the or each desired signal from 
noise. The at least one channel may comprise a plurality of signal propagation paths of 
different lengths. 

The at least one desired signal may comprise a sound signal. The at least one sound 
signal may comprise speech. The contaminated signals may be measured by spatially 
sampling a sound field. For example, acousto-electric transducers such as microphones 
may be spatially distributed in, for example, a room or other space and the output 
signals may be processed by the method in order to extract or separate speech from one 
source in the presence of background noise or signals, such as other sources of speech 
or sources of other information-bearing sound. 

The at least one desired signal may be modelled as a time- varying autoregression. This 
type of modelling is suitable for many types of desired signal and is particularly suitable 
for extracting speech. As an alternative, the at least one desired signal may be 
modelled as a moving average model. As a further alternative, the at least one desired 
signal may be modelled as a non-linear time-varying model. 



The at least one communication channel may be modelled as a time-varying finite 
impulse response model. This type of model is suitable for modelling a variety of 
propagation systems. As an alternative, the at least one communication channel may 
be modelled as an infinite impulse response model. As a further alternative, the at least 
one communication channel may be modelled as a non-linear time- varying model. 

The state space model may have at least one parameter which is modelled using a 
probability model. The at least one desired signal may be extracted by a Bayesian 
inversion. The Bayesian inversion may be performed by a sequential Monte Carlo 
method. These techniques are particularly effective for complex models and are 
potentially implementable on parallel computers. 

According to a second aspect of the invention, there is provided a program for 
controlling a computer to perform a method as claimed in any one of the preceding 
claims. 

According to a third aspect of the invention, there is provided a carrier containing a 
program in accordance with the second aspect of the invention. 

According to a fourth aspect of the invention, there is provided a computer programmed 
by a program according to the second aspect of the invention. 

The invention will be further described, by way of example, with reference to the 
accompanying drawings, in which: 

Figure 1 is a diagram illustrating a signal source and a communication channel; 

Figure 2 is a diagram illustrating audio sound sources in a room and an apparatus for 
performing a method constituting an embodiment of the invention; and 

Figure 3 is a flow diagram illustrating a method constituting an embodiment of the 
invention. 
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In order to extract a desired signal, a parametric approach is used in which the data are 
assumed to be generated by an underlying unobserved process described at time t with 
the variable x t . The variable x t contains information concerning the waveforms of the 
different sources. The problem of extracting a desired signal may be too complex for a 
good deterministic model to be available, as little is known concerning the real structure 
of the problem. Alternatively, if a good deterministic model is available, this may lead 
to a large set of intractable equations. 

The models for the way sound is generated give expressions for the likely distribution 
of the current 'state' x t given the value of the state at the previous time step x t _i. This 
probability distribution (known as the state transition distribution, or 'prior') is written 
as p(x t |x t -i). How the current observed data depend on the current state is specified 
through another probability distribution p(y t |x t ) (known as the observation distribution, 
or 'likelihood*). Finally, how the state is likely to be distributed at the initial time 
instant is specified by p(x 0 ). Specific forms for these three distributions are given later. 

A solution to circumvent the problems mentioned earlier comprises introducing 
uncertainty in the equations through probability distributions. More precisely this 
means that, instead of assuming, in a discrete time set-up, that x t +i is a deterministic 
function of past values e.g. x t+! = Ax t where A is a linear operator, the plausible regions 
of the state space where the parameter can lie are described with a conditional 
probability distribution p (x t+ i = % | x t ), the probability of x l+] being equal to % given the 
previous value x t . This may be expressed, for example, as x t+I = Ax t + v t , where v t is an 
uncertainty or error distributed according to a Gaussian distribution around zero. The 
way the distribution is spread indicates the degree of confidence in the deterministic 
component Ax t ; the narrower the spread, the greater the confidence. This type of 
modelling proves to be robust in practice to describe very complex processes while 
being simple enough to be used in practice. 

As mentioned above, whereas the structure of the process is assumed known (because 
of plausible physical assumptions), the variable x^ is not observed and solely 
observations y t are available. In general, the observation mechanism, i.e. the 



transformations that the process of interest undergoes before being observed, will 
depend on the variable Xt, but again some randomness needs to be incorporated in the 
description of the phenomenon, for example to take into account observation noise. 
Again this is done by means of a probability distribution p (y t =y|x t ), the probability of y t 
being equal to y given that the parameters of the underlying process are represented by 
x t . In the previous example, a possible observation process is of the form y t ^Cx t +w t , 
that is some transformation of the parameter x ( describing the internal state of the 
process corrupted by an additive observation noise w t . 

In the case of audio source separation, the parameter x t contains the value St of the 
desired waveforms of the sources at time t and the evolving parameters of the sources 
(a t ) and the mixing system (h t ). This is illustrated diagrammatically in Figure 1 for a 
single communication channel. Transition probability distributions are defined by: 

p(at +1 = a|a,) 
p(h t+ i = h|h t ) 

and as it is not known how to characterize this evolution, apart from the fact that at is 
expected to evolve rapidly compared to h t (which might also evolve abruptly, but this is 
expected to happen rarely): 

=*/ 

= h r +v* 

where the distributions of , v* are each isotropic, but have a different spread to 
reflect the different non-stationarity time scales. The set of the parameters at may be 
thought of as evolving spectral characteristics of the sources and the parameters h t as 
classical finite impulse response (FIR) filters. Now that the parameters of the system 
are described, the waveforms may be assumed to evolve according to a probability 
distribution: 



p(s t +i = s|st, a t ) 



which describes the model of the sources. 

One example of audio signal separation or extraction is where the sources are speech 
sources. The parameter x t may then be modelled as a time-varying autoregressive 
process, for example by the expression: 

p 

where the parameters ai, t are filter coefficients of an all-resonant filter system 
representing the human vocal tract and et represents noise produced by the vocal cords. 

Figure 2 illustrates a typical application of the present method in which two desired 
sound sources, such as individuals who are speaking, are represented by s^\ and s* 2 \ 
located in an enclosed space in the form of a room 1 having boundaries in the form of 
walls 2 to 5. Microphones 6 to 8 are located at fixed respective positions in the room 1 
and sample the sound field within the room to produce measured contaminated signals 
y (l) t , y (2) t , y (3) t . The measured signals are supplied to a computer 9 controlled by a 
program to perform the extraction method. The program is carried by a program carrier 
10 which, in the example illustrated, is in the form of a memory for controlling the 
operation of the computer 9. The computer extracts the individual speech signals and is 
illustrated as supplying these via separate channels comprising amplifiers 1 1 and 12 and 
loudspeakers 13 and 14. Alternatively or additionally, the extracted signals may be 
stored or subjected to further processing. 

Figure 2 illustrates some of the propagation paths from the sound sources to one of the 
microphones 7. In each case, there is a direct path 15, 16 from each sound source to the 
microphone 7. In addition, there are reflected paths from each sound source to the 
microphone 7. For example, one reflected path comprises a direct sound ray 1 7 from 
the source which is reflected by the wall 2 to form a reflected ray 18. In general, there 
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are many reflected paths from each sound source to each microphone with the paths 
being of different propagation lengths and of different sound attenuations. 

The objective is to estimate the sources s t given the observations y t and, to be of 
practical use, it is preferable for the estimation to be performed on line with as little 
delay as possible. Whereas the modelling process describes the causes (x t ) that will 
result in effects (the observations y t ), the estimation task comprises inverting the causes 
and the effects. In the framework of probability modelling, this inversion can be 
consistently performed using Bayes' theorem. In the context of statistical filtering, that 
is the computation of the probability of x t given all the observations up to time t, namely 
p(x t |yi :t ), given the evolution probability p(x t |x t .i) and the observation probability 
P(ytl x t)> the application of Bayes* rule yields 



, i x \ p(y\ x *)p(. x t\ x <-M x <-x\yvj-x)dXt-x 
PiX ^" ) = - TG^T) 



This equation is fundamental as it gives the recursion between the filtering density at 
time t-1, i.e. p(x t _i|y 1:t _i), and the filtering density at time t, p(x t |y 1:t ). The problem is 
that, in practice, the integral cannot be computed in closed-form and it is desired to 
compute these quantities in an on-line or real time manner. 



Whereas the use of probability distributions can be viewed as a way of representing in a 
parsimonious way the concentration of members of a population in certain regions of 
their feature space, Monte Carlo methods work in the opposite direction and rely on the 
idea that a probability distribution can be represented with an artificially generated set 
of samples distributed according to this distribution. This requires that the 
concentration of the samples in a given zone of the space of features is assumed to be 
representative of the probability of this zone under the distribution of interest. As 
expected, the larger the population, the more accurate the representation is. This 
approach possesses the advantage in many cases of greatly simplifying computation, 
which can to a large extent be performed in parallel. 



The following represents a basic explanation of the techniques involved in the present 
method. This is followed by a detailed description of a specific embodiment. 

It is assumed that, at time t-1, there are N » 1 members of a population of "particles" 

distributed according to the distribution p(xt.ijyi n .i). Then it is possible to guess 
where the particles are going to evolve using the evolution distribution p(xt|xt_i) or prior, 
from which it is typically easy to sample. Thus, 'scouts' are being sent into the regions 
which are likely, according to the prior on the evolution of the parameters. When the 
next observation y t is available, the prediction needs to be corrected as the * scouts* did 
not take into account the information brought by the new observation which can be 
quantifyied with p(yt|x t ). Some of the particles will be in regions of interest but 
typically insufficient numbers, or there might also be too many members of the 
population in regions where many less should be. 

A way of regulating the population consists of multiplying members in underpopulated 
regions and suppressing members in overpopulated regions. This process is refered to 
as a selection step. It can be proved mathematically that the quantity that will decide 
the future of a 'scout* is the importance function: 

w = 

and valid mechanisms include taking the nearest integer number to Nw\° to determine 
the number of "children" of scout number i or randomly choosing the particle x\° with 
probability . After the selection process, the children are approximately distributed 
according to p (xt|yi :t ) and the next data can be processed. This is a very general 
description of the algorithm, and many improvements are possible as described 
hereinafter for a very specific case. 
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A general recipe for how to do sequential Monte Carlo estimation is given. This is the 
most basic form of the method and is described in N.J. Gordon, D J. Salmond and 
A.F.M. Smith, "Novel approach to nonlinear/non-Gaussian Bayesian state estimation", 
IEE Proceedings-F, vol. 140, no. 2, pp. 107-1 13, 1993, the contents of which are 
incorporated herein by reference. 



In an initial step 20, time t is set to zero. N so-called "particles" are randomly chosen 
from the initial distribution p(xo) and labelled x (k) 0 , where k is an integer from 1 to N. 
The number N of particles is typically very large, for example of the order of 1,000. 
The steps 20 and 21 represent initialisation of the procedure. 

State propagation is performed in a step 22. In this step, for each particle from the 
current time t=l , an update to the current time is randomly chosen according to the 
distribution p(x t+ i |x (k) t ). The updated particles x (k) l+ i have plausible values which lie in 
the expected region of the space according to the state transition distribution. 

In a step 23, the next measured value y t+! is obtained from measuring or sampling the 
sound field. In a step 24, the weights of the particles are calculated. Because the 
updated particles x (k) t -n were generated without reference to the new measurement y t +i, it 
is necessary to calculate a weight between zero and one for each particle representing 
how "good" each particle actually is in the light of the new measurement The correct 
weight value for each particle is proportional to the value of the observation distribution 
for the particle. However, the sum of all of the weights is required to be equal to one. 
Accordingly, the weight of each particle x (k \+i is given by: 



(*) 



\x ik) ) 



A step 25 then reselects or resamples the particles in order to return to a set of particles 
without attached weights. The step 25 reselects N particles according to their weights 
as described in more detail hereinafter. A step 26 then increments time and control 
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returns to the step 22. Thus, the steps 22 to 26 are repeated for as long as the method is 
in use. 

The value of each sample may, for example, be calculated in accordance with the 
posterior mean estimator as: 



The techniques described hereinbefore are relatively general and are not particularly 
limited to the case of separating or extracting speech or other audio signals from 
"contaminated" measurements. A method directed more explicitly to extracting audio 
source signals will now be described in greater detail. 

The sources are assumed to be modelled with time varying autoregressive processes, i.e. 
source number i at time t is a linear combination of p t (the so-called order of the model) 
past values of the same source (s^-i, ...„Si fH „, or in short and vector form s,>i:t-pi) 
perturbed by a noise, here assumed Gaussian, vf , : 



This type of process is very flexible and allows for resonances to be modelled. The 
coefficients a^of the linear combination, are time-varying in order to take into account 
the non-stationarity of the resonances of speech: 





(i) 



i.z+i 



= a +vf 



(2) 



The observations, which consist of the superimposition of the different sources 
(possibly delayed) at microphone j at time t are assumed generated by the following 
process 
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(3) 



i.e. the observation is the sum over all the sources of filtered versions of the sources (the 
filter is of length ly from source i to microphone j, and introduces delays) perturbed by 
an observation noise w^ t . The transfer function from source i to microphone] evolves 
in time according to the equation 



It is assumed that the characteristics of vf #+1 and wj, t are known, whereas in the full 
version of the algorithm these parameters are also estimated. 

The steps in the sequential Monte Carlo algorithm for audio source separation are 
essentially as shown in Figure 3 but are changed or augmented as described hereinafter. 
The state vector x t is defined as a stacked vector containing all the sources, 
autoregressive parameters and transfer functions between sources and microphones. 

The initial values are randomly chosen from a normal distribution with a large spread 

In the step 22, random noise v^t+i, v^t+i and v h(k) jj, l+ i are generated from Gaussian 
distributions, for example as disclosed in B.D. Ripley, "Stochastic Simulation", Wiley, 
N.Y. 1987, the contents of which are incorporated herein by reference. These quantities 
are then added to their respective state variables: 






i.r+1 
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The step 24 calculates un-normalised weights as follows: 



w£] = expl 



■■■*°M^{y.,.~-p!Z€l-,,..J 



The weights are then renormalised in accordance with the expression: 



The variability of the variances of the excitation noise and the observation noise may be 
taken into account by defining evolution equations on the <j> yV+ , -M , = log (o ) J+X w ) and 

♦>.,♦■.,= log (<* *,♦!.„) as follows 



where v*" +l and v** +) are i.i.d. sequences distributed according to Gaussian 
distributions. 

It is then possible to improve the performance with some or all of several modifications 
to the basic algorithm as follows: 

(a) Estimation of the time-varying noise variances. Generate random noise 
v ,v^i > v u+i from normal distributions and compute 
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which are transformed with an exponential to obtain the variances of the noises, i.e. 
=«P(^^ + i)and cr v 2 y /+1 =exp (fa Jt t+j). 

(b) By using the structure of the model, the mixing filter coefficients and the 
autoregressive coefficients can be integrated out. The implication is that the calculation 
of the weight is modified to 



w (k) = oiv \x m a 2Ck) cj 2{ky ) 

This weight is calculated sequentially by use of a standard procedure, the Kalman filter, 
(see Appendix Eq. (50)) applied to the state space model for which the state consists of 
the autoregressive coefficients and mixing filter coefficients shown in the following 
equations (1 7) to (19). The advantage of this method is that the number of parameters 
to be estimated is significantly reduced and the statistical efficiency is much improved. 

(c) The distribution that propagates the values of the sources is modified to an 
approximation of the filtering density p(x,|y 1:f a l:t h X s a; :l Vm ^,<t,* l:n w ), which, in 

contrast to the basic algorithm, takes into account the new observation y t , hence leading 
to improved statistical efficiency once again. This density is the byproduct of a Kalman 
filter (say Kalman filter #2) applied to a state space model for which the state consists of 
the sources X/ and the parameters shown in the following equation (14) depend upon the 
filtered mean estimate of the mixing coefficients h and autoregressive coefficients a, 
previously obtained from Kalman filter # 1 (see Appendix) 

(d) Diversity among the particles is introduced by using a Metropolis-Hastings update 
on the particle whose target distribution is p^t{x y , :m v , , :n w ) g \y l:t ) 

This step is introduced after the step 25 and the details are given hereinafter. 
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(e) Estimation is delayed for improvement purposes, i.e. to compute the value of the 
sources at time t, wait and take into account some future observations y/+/...., y t +t for 
some L. Such a technique is called fixed-lag smoothing and does not modify the 
algorithm as it is merely necessary to wait for the data y t+1 

The full details of these different steps are given hereinafter. 

* 

The whole system including sources and channels is modelled as a state space system, a 
standard concept from control and signal processing theory. The definition of a state 
space system is that there is some underlying state in the system at time denoted Xt, 
which it is desired to estimate or extract. In the present case, this comprises the 
underlying desired sources themselves. The state is assumed to be generated as a 
known function of the previous state value and a random error or disturbance term: 

where A t +i(. . .) is a known function which represents the assumed dynamics of the state 
over time. Similarly the measurements at time /, denoted y t , are assumed to be a known 
function of the current state and another random error or noise term: 

y, =cf(x f9 w;) 

where C* (. . .) is a known function representing the contaminating process. In the 
present case, C, x (. . .) represents the filtering effects of the channel(s) on the source(s) 
and xv* represents the measurement noise in the system. 

A key element of the present method is that both the source(s) and channel(s) have 
different time-varying characteristics which also have to be estimated. In other words, 
the functions A t +) and C* themselves depend upon some unknown parameters, say 

8 * and 9 f . In the present case, 9 ^ represents the unknown time-varying autoregressi ve 
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parameters of the sources and 9 f represents the unknown time-varying finite impulse 

response filter(s) of the channel(s). These time varying characteristics are modelled 
with additional state transition functions, as described below. 

The problem addressed is the problem of source separation, the n sources being 
modelled as autoregressive (AR) processes, from which, at each time there are m 
observations which are convolutive mixtures of the n sources. 
Source / can be modelled for /=1, . . . as: 



and pi is the order of the i l AR model. It is assumed that Sj,-Pi +1 : o ~ 

A^/Wo , P* ) for / = 1, n. (v*,)^, is a zero mean normalized i.i.d. Gaussian 

sequence, i.e. for/= 1,..., /i and / = 1, ... 



The (a * (V )j = i,.„, n are the variances of the dynamic noise for each source at time /. 

It is assumed that the evolving autoregressive model follows a linear Gaussian state- 
space representation, 




(4) 




N(0,1) 



(5) 



ij+l 



(6) 




(7) 



17 

and a;. 0 ~ AT (m" , P° ) . The model may include switching between different sets of 
matrices A? 9 B? to take into account silences, for example. Typically A* =7^. and 



The mixing model is assumed to be a multidimensional time varying FIR filter. It is 
assumed that the sources are mixed in the following manner, and corrupted by an 
additive Gaussian i.i.d. noise sequence. At the yth sensor, and for / = 1 , . . m: 

yj.> = Z hj^isj-i^ + <* WJJ w j, (8) 

where l tJ is the length of the filter from source i to sensor j. The series (u/y^,... is a 
zero mean normalized i.i.d. Gaussian sequence, i.e. for j = 1,. . n and t = 1 . . 



N(O y l) (9 ) 



The quantities (a l J t )^ m are the variances of the observation noise for each sensor at 

time L The My, are assumed independent of the excitations of the AR models. The 
constraint [h/j. Ji j = 1 and [hjjj] K x = 0 for/ = i mod m and i = 1 . n are imposed. In 
the case m = «, this constraint corresponds to [hyju = 1 and [h id , t ] kJ = 0 for j = L As 
for the model of the sources, it is assumed that the observation system also follows a 
state-space representation. Writing § wjj ^_log (a 2 W . , ) ; 

*^v*.=^*^ (io) 

with 



and - A^/?). 



IS 



For each of the sources /', the signal can be rewritten in the following form: 



o =A's +B' v' 

' 1 i.t- > i.t-l.t-Jl i ^ IJ i.t y i.t 



where A, A max max {(. y } and A\ A is a Xi x X, matrix defined as: 



(12) 



The dynamic system equations can be rewritten as: 

y,=C?x l+ D?»>; 



with 



A;Miag(A u _A m _, \ B;-diag(B u ..,B mJ ) 

V ' - ( V U ■ ■ V 'm., J » X , - {?U,J-*t ♦! - J L,-i m .1 )^ 

[c*] contains the mixing system 

and wf ={w lJ ...w mJ Y ,D'-diag(a wl ...a w „) 



(13) 



(14) 



(15) 



Defining the "stacked" parameter vectors 

kly-y''-'/ + *,iA[/i..y,L' = ^ , = 1. = ...,/, (16) 

and / a A^T^ p t - /, y = X^i'm St is P ossib,e to consider the following state 

space representations. 
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*„.=<,*,+ OA, 
y, =C?h t + £>>,* 

where A?B? e *'* x ' 4 , C* € J?"""*, D* e J?"" 



(17) 



(18) 



1 = 1, o 

and 

x, = c; <*,+/>>; (19) 



with A°B? e R } -"-C? eR™ 1 - where 



(20) 



This is a practical interest as it allows the mixing filters and autoregressive filters to be 
integrated out because, conditional upon the variances and states x t , the systems (17) 
and (19) are linear Gaussian state space models. Together with (14) they define a 
bilinear Gaussian process. 

Given the number of sources m, Pi and l y it is required to estimate sequentially the 
sources (x,) t= ,,... and their parameters Q.bXa h ct 2 . <j 2 \ 

from the observations^.,. More precisely, in the framework of Bayesian estimation, 
one is interested in the recursive, in time, estimation of posterior distributions of the 
type 

p( rfe ,A,|jW ) : when L = 0 this corresponds to the filtering distribution and when 
L > 0 this corresponds to the fixed-lag smoothing distribution. This is a very complex 
problem that does not admit any analytical solution, and it is necessary to resort to 
numerical methods. Such a numerical method is based on Monte Carlo simulation. 
Subsequently the following notation will be used: 
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A simulation-based optimal filter/fixed-lag smoother is used to obtain filtered/fixed-lag 
smoothed estimates of the unobserved sources and their parameters of the type 

h if, )k \f, G , . *, )M bw ) (2D 

The standard Bayesian importance sampling method is first described, and then it is 
shown how it is possible to take advantage of the analytical structure of the model by 
integrating onto the parameters at and h t which can be high dimensional, using Kalman 
filter related algorithms. This leads to an elegant and efficient algorithm for which the 
only tracked parameters are the sources and the noise variances. Then a sequential 
version of Bayesian importance sampling for optimal filtering is presented, and it is 
shown why it is necessary to introduce selection as well as diversity in the process. 
Finally, a Monte Carlo filter/fixed-lag smoother is described. 

For any f t it will subsequently be assumed that \l L (f t )|< + °°- Suppose that it is possible 

to sample N i,i,d. samples, called particles, (^o7 + z,. 0 o?+l) according to 

p{ x o-j+l ® o-j+L \yiu*L ) Th en a* 1 empirical estimate of this distribution is given by 

TrZ,-. 6 **) em (^u^u) (22) 

so that a Monte Carlo approximation of the marginal distribution p (dx t , d9t|yi : ,+L) 
follows as 



p N (dx fy dQ f \y lJ+L ) = Xr=, 5 jfofl (0 (dx t 4Q t ) 



(23) 
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Using this distribution, an estimate of II (/V) for any / t may be obtained as 



(24) 



This estimate is unbiased and from the strong law of large numbers, 
Il,n (ft) N -++oo II (/t). Under additional assumptions, the estimates satisfy a central limit 
theorem. The advantage of the Monte Carlo method is clear. It is easy to estimate I|_ 
(J t) for any f u and the rate of convergence of this estimate does not depend on t or the 
dimension of the state space, but only on the number of particles N and the 
characteristics of the function / t . Unfortunately, it is not possible to sample directly 
from the distribution p (dxo : t+u d9ou+L|yi:t+L) at any t, and alternative strategies need to 
be investigated. 



One solution to estimate p (dxoct+udtWiJyu+L) and 

II (ft) is the well-known Bayesian importance sampling method as disclosed in A. 
Doucet, SJ. Godsill and C. Andrieu, "On sequential Monte Carlo sampling methods for 
Bayesian filtering", Statistics and Computing, 2000, the contents of which are 
incorporated herein by reference. This method assumes the existence of an arbitrary 
importance distribution n (dx 0: t+L> d0 O: t+L|yo:t+L) which is easily simulated from, and 
whose support contains that of p (dx 0 ;H-L,d9 0 :t+L|yi:t+L). Using this distribution I L (/,) 
may be expressed as 



J L\Jf)~ — j — " , (/->) 

where the importance weight w (x O: t+L,0O:t+L) is given by 



m \ x oj^l &<yj+L )0t — j (26) 

71 \ X <yj+L & 0:t+L \y\ j+L ) 
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The importance weight can normally only be evaluated up to a constant of 
proportionality, since, following from Bayes' rule 



_ />(JW \**J+L »6 Qj+L M<&W > dQgu+L ) (27) 

where the normalizing constant p(yi :t +L) can typically not be expressed in closed-form. 

If N i.i.d. samples (*£^ L ,0 ^ ) can be simulated according to a distribution 

n (dx O: t+L,<i0o:t+L|yi:t+L), a Monte Carlo estimate of II (/t) in (25) may be obtained as 



£/=! "to 



2./,(*f , ,e, w wC.ei? rt ) 



(28) 



where the normalized importance weights are given by 



— <0 _ W i X 0^-§-L*^oJ+L ) 



Oj+L 



(29) 



This method is equivalent to a point mass approximation of the target distribution of the 
form 



PN( dx o J +L> dQ <y.,+i\y,,+L'> 

= W to + t 8 ^Z) Q (,) ( dx ou*L » ^® Oj + i X 



(30) 



0:/ + Z.» v '0^ + i 



The perfect simulation case, when 71 (dxo:, +L > dGort+Uynt+L) = P (dx 0:t+ L, d9 0: t+L|yi:t+L), 
corresponds to vv 0 ( £ £ = N~\ i = 1, .,R In practice, the importance distribution will be 
chosen to be as close as possible to the target distribution in a given sense. For finite N, 
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I\ N (J x ) is biased, since it involves a ratio of estimates, but asymptotically; according to 
the strong law of large numbers, I LN (ft) Il(/0- Under additional assumptions 



a central limit theorem also holds as disclosed in Doucet, Godsill and Andrieu. 

It is possible to reduce the estimation of p (dx t , d9 t |yi : t+L) and II (ft) to one of sampling 
from p (dy 0 ;t+L|yi:t+L), where we recall that y t A {x t ,a l VjJJtV9 o f Jn w } . Indeed, 

p (da t ,dy 0: t + L|yi:t+L) = P (da t |yo:t + u yi:t+0 x p (dy^l yirt+O (31) 

where p (da t |yo:t+L>yi:t+L) is a Gaussian distribution whose parameters may be computed 
using Kalman filter type techniques. Thus, given an approximation of p (dyo :t +L|yi:t+L)> 
an approximation of p (dxt,d9 t |yi:t+L) may straightforwardly be obtained. Defining the 
marginal importance distribution and associated importance weight as 

™ tor o. t+ L \y^L ) = M«/a fc^rfy oj+L |jw ) 



*(Y<h/*JjW) > 



and assuming that a set of samples Yo-f+i distributed according to n(dy to+£ J^, rf+ii ) is 
available, an alternative Bayesian importance sampling estimate of I L (ft) follows as 

KM^ n \f ^ t ,^ t )/,(^ (i '.e, ( "MTL) 



provided that p (a,tyo : t+L,yi:t+L)/t (*t, Qt) can be evaluated in a closed form expression. In 
(33) the marginal normalized importance weights are given by 



= ^r^n i = h....,N. (34) 
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Intuitively, to reach a given precision, will need a reduced number of samples 

over I L N {f t ), since it only requires samples from the marginal distribution 

n (fh <£+jL \y\*+L )• I* can be proved that the variances of the estimates is subsequently 

reduced as disclosed in A. Doucet, J.F.G. de Freitas and NJ. Gordon (eds.), Sequential 
Monte Carlo Methods in Practice, Springer- Verlag, June 2000, the contents of which 
are incorporated herein by reference. In the present case, this is important as at each 
time instant the number of parameters is (when assuming that all mixing filters and AR 
processes have the same length), 

. m 2 L - mL parameters for the mixing filters, where L can be large. 

• m or 1 parameters) for the observation noise. 

• nl + n parameters for the autoregressive processes. 

• n parameters for the sources. 

It is not clear which integration will allow for the best variance reduction, but, at least in 
terms of search in the parameters space, the integration of the mixing filters and 
autoregressive filters seems preferable. 

Given these results, the subsequent discussion will focus on Bayesian importance 
sampling methods to obtain approximations of p(dy { ^ L \y x ,^ L ) and I L (/ t ) using an 

importance distribution of the form n (dy^ +l \y Ui _ L ). The methods described up to now 
are batch methods. How a sequential method may be obtained is described below. 

The importance distribution at discrete time t may be factorized as 

*(rfyo* + £|jw) 

(35) 
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The aim is to obtain at any time t an estimate of the distribution p (dyo t+UyH+L) and to 
be able to propagate this estimate in time without modifying subsequently the past 
simulated trajectories y » +£ . This means that n (dyormJy.rt+L) should admit n (dy 0:t . 

i+Uyirt-i+L) as marginal distribution. This is possible if the importance distribution is 
restricted to be of the general form 

^WoWJ-Mdy^^y,.,), (36) 

Such an importance distribution allows recursive evaluation of the importance weights, 
i.e. w (yo:tn.) = w (yom+l) w, +L , and in this particular case 



O-j+L 

Ou+L-l>yi:f+L ) 



The quantity p (dx ( + L |xt+L>Pt+L) can be computed up to a normalizing constant using a 
one step ahead Kalman filter for the system given by Eq. (19) and p (yt-nJxort+uPi+O can 
be computed using a one step ahead Kalman filter of the system given by Eq. (17). 

There is an unlimited number of choices for the importance distribution n (dy 0 ; t+L |y i :i+l), 
the only restriction being that its support includes that of p (dyort+Uy i t+O Two 
possibilities are considered next. A possible strategy is to be choose at time t+L the 
importance distribution that minimizes the variance of the importance weights given y 0 t- 
i and y, :t . The importance distribution that satisfies this condition is p. (dY t+L |Y 0:t . 
l+uyi.t+t), with the associated incremental importance weight given by 



(38) 
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Direct sampling from the optimal importance distribution is difficult, and evaluating the 
importance weight is analytically intractable. The aim is thus in general to mimic the 
optimal distribution by means of tractable approximations, typically local linearization 
P (dyt|yoa-i,yi:t)- Instead, a mixed suboptimal method is described. We propose to 
sample the particles at time t according to two importance distributions 7ii and 7t2 with 
proportions a and 1 - a such that the importance weights w (y ^ ) have now the form 
(note that it would be possible to draw Ni and N 2 randomly according to a Bernoulli 
distribution with parameter a, but this would increase the estimator variance) 



a 



bw) 

*,(YoL.|>w) 



IE* 



0-a) 



* 2 (Yto*t|jW) 



*i(Yo, + Lbw). 

bw) 



IE. 



*2(Yto-JjW). 



(39) 



which in practice is estimated as 



a 



Oj+L 

s£ p(Y & t \y, J+L ) /* «(y bw ) 
p(y l £+L |jw ) / * »(y ff.Jjw ) 

P(Yo',lt|:>W *(Yo-? +i bw) 



(40) 



The importance distribution 711 (dx t+ L|xi:, + L-i,Pi:t+L,yi:t+0 will be taken to a normal 
distribution centered around zero with variance a \ , and Ji2 (dxt+L|xj:t+L-i.Pi:t+L,yi:t+L) is 



taken to be p(dx^\m^ L ,P;; h L ^ L , p£ , y,, +L ) which is a Gaussian distribution 

obtained from a one step ahead Kalman filter for the state space model described in (14) 
with m tt(i ,\ , and m* ( ?, . as values for a (i) and h (,) and initial variances ^t+m+L and 



The variances are sampled from their prior distributions, and expression (37) is 
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used to compute w^ L . Note that other importance distributions are possible, but this 
approach yields good results and seems to preserve diversity of the samples. 

For importance distributions of the form specified by (36) the variances of the 
importance weights can only increase (stochastically) over time, as disclosed by Doucet, 
Godsill and Andrieu and the references therein. It is thus impossible to avoid a 
degeneracy phenomenon. Practically, after a few iterations of the algorithm, all but one 
of the normalized importance weights are very close to zero, and a large computational 
effort is devoted to updating trajectories whose contribution to the final estimate is 
almost zero. For this reason it is of crucial importance to include selection and 
diversity. This is discussed in more detail hereinafter. 

The purpose of a selection (or resampling) procedure is to discard particles with low 
normalized importance weights and multiply those with high normalized importance 
weights, so as to avoid the degeneracy of the algorithm. A selection procedure 
associates with each particle, say y£] , a number of children N,- <=N, such that 

N i = N > to obtain N new particles y£ . If Nj = 0 then is discarded, otherwise 

it has N, children at time t+1 . After the selection step the normalized importance 
weights for all the particles are reset to N" J , thus discarding all information regarding the 
past importance weights. Thus, the normalized importance weight prior to selection in 
the next time step is proportional to (37). These will be denoted as wj iy , since they do 
not depend on any past values of the normalized importance weights. If the selection 
procedure is performed at each time step, then the approximating distribution before the 

y*j ) = X" "?° 5 -<o tof o:r )> and *e one after the 



selection step is given by p N (dy 



0* 



J 0J 



ytu) = N ~ l T,tx™? }5 (otoTo*)- Systematic sampling 



selection step follows as p N {dy 0t 

'OJ 

as disclosed by Doucet, de Freitus and Gordon is chosen for its good variance 
properties. 
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However selection poses another problem. During the resampling stage, any particular 
particle with a high importance weight will be duplicated many times. In particular, 
when L>0, the trajectories are resampled L times from time t +1 to t + L so that very 
few distinct trajectories remain at time t + L. This is the classical problem of depletion 
of samples. As a result the cloud of particles may eventually collapse to a single 
particle. This degeneracy leads to poor approximation of the distributions of interest. 
Several suboptimal methods have been proposed to overcome this problem and 
introduce diversity amongst the particles. Most of these are based on kernel density 
methods (as disclosed by Doucet, Godsill and Andrieu and by N.J. Gordon, D.J. 
Salmond and A.F.M. Smith, "Novel approach to nonlinear/non-Gaussian Bayesian state 
estimation", EE Proceedings-F, vol. 140, no. 2, pp. 107-1 13, 1993, (the contents of 
which are incorporated herein by reference) which approximate the probability 
distribution using a kernel density estimate based on the current set of particles, and 
sample a new set of distinct particles from it. However, the choice and configuration of 
a specific kernel are not always straightforward. Moreover, these methods introduce 
additional Monte Carlo variation! It is shown hereinafter how MCMC methods may be 
combined with sequential importance sampling to introduce diversity amongst the 
samples without increasing the Monte Carlo variation. 

An efficient way of limiting sample depletion consists of simply adding an MCMC step 
to the simulation-based filter/fixed-lag smoother (see Berzuini and Gilks referred to by 
Doucet, Godsill and Andrieu and by CP. Robert and G. Casella, Monte Carlo Statistical 
Methods, Springer Verlag, 1999, the contents of which are incorporated herein by 
reference). This introduces diversity amongst the samples and thus drastically reduces 
the problem of depletion of samples. Assume that, at time t + L, the particles y^l L are 
marginally distributed according to p (dyort+Uyi i+L) If a transition kernel K 
(Yo:i+Udyo:t+L) with invariant distribution p (dYo*nJyi:t+L) is applied to each of the 
particles, then the new particles y£l L are still distributed according to the distribution 
of interest. Any of the standard MCMC methods, such as the Metropolis-Hastings 
(MH) algorithm or Gibbs sampler, may be used. However, contrary to classical MCMC 
methods, the transition kernel does not need to be ergodic. Not only does this method 
introduce no additional Monte Carlo variation, but it improves the estimates in the sense 
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that it can only reduce the total variation norm of the current distribution of the particles 
with respect to the target distribution. 



Given at time t + L-l, NeN* particles y distributed approximately according to 
P (dyo:t+L-i|yi:t+L-i)» the Monte Carlo fixed-lag smoother proceeds as follows at time t + 



• Fori = 1, — JM» f/ii ~n (^ <+£ |*£ +£ -,,JW) and set f « +L = (y£ M ,f £). 



• For i = 1, . . ..,N, compute the normalized importance weights w)2 L using (37) and 



Selection Step 

• Multiply / discard particles y^ L with respect to high / low normalised importance 
weights to obtain N particles y \£\ L , e.g. using systematic sampling. 

MCMC Step 

• For i = 1 , . . ..,N, apply to y 'Jtjl L a Markov transition kernel K (y ^ L \dy $'l L ) with 
invarient distribution p (dy 0 :t+Uyi:t+L) to obtain N particles y 

There is an unlimited number of choices for the MCMC transition kernel. Here a one- 
at-a-time MH algorithm is adopted that updates at time t + L the values of the Markov 
process from time t to t + L. More specifically y J° , k = t, . . .,t + L, i = 1 . ..,N, is 



sampled according to p (dy k |y « , y^ L ), with y ™ A (y , y « , y £? »•— »Y Tl )• It is 



straightforward to verify that this algorithm admits p (dy 0: t+L|yi:t+L) as invariant 
distribution. Sampling from p dy k |y ^ , y Vj + L ) can be done efficiently via a backward - 



L. 



Sequential Importance Sampling Step 



(40). 
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forward algorithm of O (L+l) complexity as disclosed in A. Doucet and C. Andrieu, 
"Iterative algorithms for optimal state estimation of jump Markov linear systems", in 
Proc. Conf. IEEE ICASSP, 1999 the contents of which are incorporated herein by 
reference. At time t + L it proceeds as summarised below for the i-th particle. 

For k = t + L, t, compute and store P'^m'^ and P' k ^\ x by running the information 
filter defined in (50H51) of the Appendix for the two systems (17) and (19). 

Forward Step 

For k = t, , t + L. 

• - Sample a proposal y k ~ q (dykJy- kj yo:t+L), using the proposal distribution in (43). 

- Perform one step of the Kalrnan filter in (48)-(49) for the current value y and 

the proposed value y k , and calculate their posterior probabilities using (41) and 
for the two systems (17) and (19). 

Compute the MH acceptance probability a(y k (i) ,y k ) , as defined in (44). 

- If(u~U [0 j]) <<x (y; (0 ,y 4 ),set y< f) = y k , otherwise set y<° = y k in . 

The target posterior distribution for each of the MH steps is given by 

>P*K>P- 4 ) 

a />OW \*0*+L > Pto +£ )/>(** \ X -t ' P «*♦£ )/>(P* K* , P-, ) 

« />GW |*to*i. . P to +i )P(** |*0:*_, • P to* t > Po:, + t )/>(P* |P-, ) ^ 

+U+L > P k+lJ+L 

* />(P»|P,-, )p(P» + ,|P* )p(x k |p 0 , +i ) \p{x k ^ L \a k , p i+tt+t jx l!t , p fti 
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There is a similarity between the two expressions in and x^. These two terms can be 
computed in an identical way by first using two Kalman filters for system (17) and (19) 
to obtain p (y fc |xo:k.i,Po:k-i) = N(y k ; y l]k _,S k ) and p(x k ) = AT (x, ; x t | 4 _, ) and 
secondly two information filters to obtain 

jp(yt*b*L \ h t > x k+x,+L • P**i»*i \y* > x o* . Po* ) dh t 

o\i, h + n t Rrp*»R:\-+» 

and a similar expression for J p(x Mj + L \a k ^ k+ vj+i)P( a k \ x ix » Po* )^ a k • The different 
matrices involved are defined as follows. Let P k h tk = R k Yl k R k T y where Tl h k e r"**"* is 
the diagonal matrix containing the n h < n h non-zero singular values of 

P k * k , and R k e R***"* is the matrix containing the columns of R k corresponding to the 
non-zero singular values, where P k ^ k = R k Yl h k R k T is the singular value decomposition of 
P k * k The matrix Q k is given by 

To sample from the distribution in (41) using a MH step, the proposal distribution is 
here taken to be: 



^(P* |P*-i . P»*i )«#KP* + i |P» )/KP* |P*_, ) 

■ (42) 

O-j+L * 



and 

?(y t fr -* ,jw MP* |pV, > P, +1 )?(** \x_ k , p 0: , +i , y to+t ). (43) 
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which requires a one step ahead Kalman filter on the system (14). In both cases 
(4>*-i»4>**i ) ~ N( ^ k ~ l »~~)- If the current and proposed new values for the 

state of die Markov chain are given by y k and y' k , respectively, the MH acceptance 
probability follows as 

<* (Y * , Y I ) = min {1, r(y k ,y ; )} , (44) 

with the acceptance ratio given by 



p(y k \f- k , y VJ + L )g(y 1 (y - 4 , y l:f+L ) 



At each iteration the computational complexity of the Monte Carlo fixed-lag smoother 
is O ((L + 1)N), and it is necessary to keep in memory the paths of all the trajectories 
from time t to t + L, i.e. {y : i = 1,...., N) as well as the sufficient statistics 

The computational complexity of this algorithm at each iteration is clearly O (N). At 
first glance, it could appear necessary to keep in memory the paths of all the trajectories 
(y ou '• * ~ ] > — > N) > so that the storage requirements would increase linearly with time. 
In fact, for both the optimal and prior importance distributions, n (y t |yo:t-i,yi:t) and the 
associated importance weights depend on y 0: t-i only via a set of low-dimensional 
sufficient statistics {m^ f H0 , P* f Mi) : i = 1,...., Af}, and only these values need to be kept in 

memory. Thus, the storage requirements are also O (N) and do not increase over time. 
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A ppendix: Kalman filter recursions 
The system 

*, + . =^« + .*, +£,*. v ,*, < 46 ) 
y t =C t X,+D,w, (47) 

is considered. 

The sequence c;i : t being here assumed known, the Kalman filter equations are the 
following. 

Set mo|o= Qo and P 0 |o =Po» then for t = 1 ,T compute 

JV« =C <"V> C < (47) 
S t =C,P, M Cr +D.DJ 

>*=P*-*-P*-tfS7 X C,P^ (48) 

where m t( i-i = E{xi|yi : i.i^ u }, m^ E {x t |ym, qi.i>, J^ift-i 355 yt - YtM> Ptjt-i = 
cov {xjyi;i_i, Pin = cov {x t |y l:l ,q, :l }, y tM = E {y,|y I:t .i, qi*} and 
S t = cov {yt|yi:i-i,<;i:t}. The likelihood p (yj<;i:i) is estimated as 



The backward information filter proceeds as follows from time t + L to t. 
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andfork = t + L-l, ....,1 



X P k + \\k + \ m k+\\k+\ 



pi{:=p'^+c T k {D t Dirc k 



(50) 



CLAIMS: 
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1 . A method of extracting at least one desired signal from a system comprising at 
least one measured contaminated signal and at least one communication channel via 
which the at least one contaminated signal is measured, comprising modelling the 
system as a state space model in which the at least one desired signal has first 
characteristics and the at least one communication channel has second characteristics 
which are different from the first characteristics. 

2. A method as claimed in claim 1, in which the first characteristics are fixed and 
known. 

3. A method as claimed in claim 1, in which the first characteristics are time- 
varying. 

4. A method as claimed in claim lor 3, in which the second characteristics are 
fixed and known. 

5. A method as claimed in any one of claims 1 to 3, in which the second 
characteristics are time-varying. 

6. A method as claimed in claim 5 when dependent on claim 3, in which the first 
time-varying characteristics vary on average more quickly than the second time-varying 
characteristics. 

7. A method as claimed in any one of the preceding claims, in which the at least 
one communication channel comprises a plurality of communication channels and the at 
least one contaminated signal comprises a plurality of contaminated signals. 

8. A method as claimed in any one of the preceding claims, in which the at least 
one desired signal comprises a plurality of desired signals. 
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9. A method as claimed in claim 7 or in claim 8 when dependent on claim 7, in 
which the number of communication channels is greater than or equal to the number of 
desired signals. 

10. A method as claimed in claim 8 or in claim 9 when dependent on claim 8, in 
which the at least one contaminated signal comprises a linear combination of time- 
delayed versions of at least some of the desired signals. 

11. A method as claimed in any one of the preceding claims, in which the at least 
one contaminated signal comprises the at least one desired signal contaminated with 
noise. 

12. A method as claimed in any one of the preceding claims, in which the at least 
one channel comprises a plurality of signal propagation paths of different lengths. 

13. A method as claimed in any one of the preceding claims, in which the at least 
one desired signal comprises a sound signal. 

14. A method as claimed in claim 1 3, in which the least one sound signal comprises 
speech. 

15. A method as claimed in claim 13 or 14 when dependent on claim 8, in which the 
contaminated signals are measured by spatially sampling a sound field. 

16. A method as claimed in any one of the preceding claims, in which the at least 
one desired signal is modelled as a time-varying autoregression. 

1 7. A method as claimed in any one of claims 1 to 15, in which the at least one 
desired signal is modelled as a moving average model. 

18. A method as claimed in any one of claims 1 to 15, in which the at least one 
desired signal is modelled as a non-linear time- varying model. 
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19. A method as claimed in any one of the preceding claims, in which the at least 
one communication channel is modelled as a finite impulse response model. 

20. A method as claimed in any one of claims 1 to 1 8, in which the at least one 
communication channel is modelled as an infinite impulse response model. 

21 . A method as claimed in any one of claims 1 to 18, in which the at least one 
communication channel is modelled as a non-linear time-varying model. 

22. A method as claimed in any one of the preceding claims, in which the state 
space model has at least one parameter which is modelled using a probability model. 

23. A method as claimed in claim 22, in which the at least one desired signal is 
extracted by a Bayesian inversion. 

24. A method as claimed in claim 23, in which the Bayesian inversion is performed 
by a sequential Monte Carlo method or particle filter. 

25. A program for controlling a computer to perform a method as claimed in any 
one of the preceding claims. 

26. A carrier containing a program as claimed in claim 25. 

27. A computer programmed by a program as claimed in claim 25. 
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